extend router replay by faresobeid · Pull Request #2703 · PrimeIntellect-ai/prime-rl

faresobeid · 2026-06-04T00:21:58Z

Replayed experts are only kept if the trainers score on them is above the score of its weakest expert * ratio

Note

Medium Risk
Changes MoE expert selection on the training forward path when filtering is enabled, which can alter gradients and load balancing; default-off config limits blast radius.

Overview
Adds optional plausibility filtering for MoE router replay during RL training. With trainer.router_replay_score_threshold_ratio set above 0, each inference-replayed expert is kept only if the trainer router’s gate score for that expert is at least that fraction of the trainer’s weakest top-k score for the token; rejected slots are backfilled from the trainer’s own top-k picks. The default 0 leaves behavior unchanged (strict replay of inference routing).

Wiring: new trainer config field, configure_router_replay_filter applied at model init when router replay is on, and logic in TokenChoiceTopKRouter plus docs for the inference/trainer TOML knobs. torch.histc inputs are cast to float where needed.

^{Reviewed by Cursor Bugbot for commit 7e7f36f. Bugbot is set up for automated code reviews on this repo. Configure here.}

extend router replay

7e7f36f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend router replay#2703

extend router replay#2703
faresobeid wants to merge 1 commit into
mainfrom
router_replay_extend

faresobeid commented Jun 4, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

faresobeid commented Jun 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

faresobeid commented Jun 4, 2026 •

edited by cursor Bot

Loading